-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ScalarBuffer in Parquet with Vec (#1849) (#5177) #5178
Conversation
@@ -339,8 +332,8 @@ where | |||
Some(keys) => { | |||
// Happy path - can just copy keys | |||
// Keys will be validated on conversion to arrow | |||
let keys_slice = keys.spare_capacity_mut(range.start + len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually incorrect, but didn't matter as spare_capacity_mut didn't update the length of the buffer directly, and so this would just potentially allocate more space than necessary
use arrow_data::ArrayDataBuilder; | ||
use arrow_schema::{DataType as ArrowType, TimeUnit}; | ||
use std::any::Any; | ||
use std::sync::Arc; | ||
|
||
/// Provides conversion from `Vec<T>` to `Buffer` | ||
pub trait IntoBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole module is crate private, so this isn't a breaking change
/// | ||
/// [scalar]: https://doc.rust-lang.org/book/ch03-02-data-types.html#scalar-types | ||
/// | ||
pub trait ScalarValue: Copy {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is crate-private and so this is not a breaking change
/// instead a subsequent call should be made to [`BufferQueue::set_len`] | ||
fn spare_capacity_mut(&mut self, batch_size: usize) -> &mut Self::Slice; | ||
/// instead a subsequent call should be made to [`BufferQueue::truncate_buffer`] | ||
fn get_output_slice(&mut self, batch_size: usize) -> &mut Self::Slice; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to rename these methods so they didn't collide with methods on Vec. Eventually the plan with #5178 is to remove the need for this trait entirely
As an added bonus this appears to yield some non-trivial performance improvements
|
Which issue does this PR close?
Closes #1849
Part of #5177
Rationale for this change
Following #3756 it is possible to construct arrow arrays directly from
Vec
without copying. This PR updates parquet to do this, not only reducing the amount of code, but opening the door to pushingVec
intoColumnReader
proper (#5177)What changes are included in this PR?
Are there any user-facing changes?